Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[data] Add better support for udf returns from list of datetime objects #46762

Merged
merged 3 commits into from
Jul 26, 2024

Conversation

omatthew98
Copy link
Contributor

Why are these changes needed?

This PR adds better support for the translation of lists of datetime objects. It ensures that if pyarrow blocks are used then the datetime objects will be loaded as timestamps.

Related issue number

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

Signed-off-by: Matthew Owen <mowen@anyscale.com>
@omatthew98 omatthew98 changed the title [data] [data] Add better support for udf returns from list of datetime objects Jul 24, 2024
Copy link
Member

@bveeramani bveeramani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems reasonable to me

Comment on lines 47 to 55
if dt.microsecond != 0:
highest_precision = 'datetime64[ns]'
break
elif dt.second != 0:
highest_precision = 'datetime64[s]'
elif dt.minute != 0:
highest_precision = 'datetime64[m]'
elif dt.hour != 0:
highest_precision = 'datetime64[h]'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For my own understanding, is this list of precisions exhaustive?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The full list is here: https://numpy.org/doc/stable/reference/arrays.datetime.html#datetime-units.

Not including lower precision units (W, M, Y) is not a problem unless people need to represent times so distance that the extra range is needed (for context day precision allows for linux epoch +/- 2.5e16 years).

For higher precision units (ns, ps, fs, as), the precision of python datetime is microseconds so I don't think we need to support those in this function.

return highest_precision

def _convert_datetime_list_to_array(datetime_list: List[datetime]) -> np.ndarray:
# Detect highest precision
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: This comment seems superfluous since the function is already name detect_highest_dataetime_precision

Signed-off-by: Matthew Owen <mowen@anyscale.com>
@omatthew98 omatthew98 force-pushed the mowen/transform-date-list branch from 6092528 to f9e9b44 Compare July 25, 2024 19:22
Signed-off-by: Matthew Owen <mowen@anyscale.com>
@omatthew98 omatthew98 force-pushed the mowen/transform-date-list branch from c682e64 to a7c6b4d Compare July 25, 2024 20:35
@omatthew98 omatthew98 added the go add ONLY when ready to merge, run all tests label Jul 25, 2024
@bveeramani bveeramani merged commit 69f3218 into ray-project:master Jul 26, 2024
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
go add ONLY when ready to merge, run all tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants